Selection of Best Outlier Detection Method Using Regression Analysis

نویسندگان

  • Fahad Sultan
  • Mudassir Ahmed
چکیده

Outliers are unusual data values that are inconsistent with most of the records. Such non-representative records can seriously affect the model to be produced, so detecting outlier is a significant job to achieve higher accuracy. Several outlier detection methods are used in literature for real as well as simulated data sets. The aim of this study is to compare the two outlier detection method i.e. Cook’s Distance and Mahalnobis method with standard method for outlier detection. The 15 replicates for simple linear regression of total household expenditure and household size are used to find the most efficient outlier detection method. It is found that the standard method of outlier detection produce less residual to predict the actual values as compared to the other two methods. While among the other two methods Cook’s method produce less prediction error as compared to the Mahalanobis methods. Keywords-Outlier; Outlier Detection; Regression; Analysis; Data; Applications

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection by Boosting Regression Trees

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...

متن کامل

A statistical test for outlier identification in data envelopment analysis

In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...

متن کامل

Analysis of a Problem Using Various Visions

 In this paper an applied problem, where the response of interest is the number of success in a specific experiment, is considered and by various visions is studied. The effects of outlier values of response on results of a regression analysis are so important to be studied. For this reason, using diagnostic methods, outlier response values are recognized. It is shown that use of arc-sine ...

متن کامل

Predicting software project effort: A grey relational analysis based method

The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, outlier detection, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly ...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010